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The past few years have seen a surge of interest in the field of probabilistic logic learning 
and statistical relational learning. In this endeavor, many probabilistic logics have been 
developed. ProbLog is a recent probabilistic extension of Prolog motivated by the mining 
of large biological networks. In ProbLog, facts can be labeled with probabilities. These 
facts are treated as mutually independent random variables that indicate whether these 
facts belong to a randomly sampled program. Different kinds of queries can be posed 
to ProbLog programs. We introduce algorithms that allow the efficient execution of these 
queries, discuss their implementation on top of the YAP-Prolog system, and evaluate their 
performance in the context of large networks of biological entities. 
To appear in Theory and Practice of Logic Programming (TPLP) 



1 Introduction 

In the past few years, a multitude of different formalisms combining probabilistic 
reasoning with logics, databases, or logic programming has been developed. Promi- 



nent examples include PHA and ICL (Poole 1993b Poole 20001, PRISM (Sato 



and Kameya 2Q0ip, SLPs ([Muggleton 1995| ), ProbView ([Lakshmanan et al. 1997[) 



CLP{BAf) (Santos Costa etal. 2003), CP-logic (Vennekens et aL 20041, Trio (Widom 



2005), probabilistic Datalog (pD) (Fuhr 20001, and probabilistic databases (Dalvi 



and Suciu 2004 ) . Although these logics have been traditionally studied in the know- 



ledge representation and database communities, the focus is now often on a machine 
learning perspective, which imposes new requirements. First, these logics must be 
simple enough to be learnable and at the same time sufficiently expressive to support 
interesting probabilistic inferences. Second, because learning is computationally ex- 
pensive and requires answering long sequences of possibly complex queries, inference 
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in such logics must be fast, although inference in even the simplest probabilistic 
logics is computationally hard. 

In this paper, we study these problems in the context of a simple probabilistic 
logic, ProbLog (De Raedt et al. 2007 ) , which has been used for learning in the con- 
text of large biological networks where edges are labeled with probabilities. Large 
and complex networks of biological concepts (genes, proteins, phenotypes, etc.) can 
be extracted from public databases, and probabilistic links between concepts can 
be obtained by various techniques (Sevon et al. 2006). ProbLog is essentially an 
extension of Prolog where a program defines a distribution over all its possible 
non-probabilistic subprograms. Facts are labeled with probabilities and treated as 
mutually independent random variables indicating whether or not the correspond- 
ing fact belongs to a randomly sampled program. The success probability of a 
query is defined as the probability that it succeeds in such a random subprogram. 
The semantics of ProbLog is not new: it is an instance of the distribution seman- 
tics (Sato 1995). This is a well-known semantics for probabilistic logics that has 
been (re)defined multiple times in the literature, often in a more limited database 
setting; cf. ( [Dantsin 1991] [Poole 1993b} |Fuhr 2000| [Poole 2000[ [Dalvi and Suciu 



2004). Sato has, however, shown that the semantics is also well-defined in the case 



of a countably infinite set of random variables and formalized it in his well-known 
distribution semantics (Sato 19951. However, even though relying on the same se- 



mantics, in order to allow efficient inference, systems such as PRISM (Sato and 



Kameya 2001 ) and PHA ( jPoole 1993b ) additionally require all proofs of a query to 



be mutually exclusive. Thus, they cannot easily represent the type of network analy- 
sis tasks that motivated ProbLog. ICL ( Poole 2000 ) extends PHA to the case where 
proofs need not be mutually exclusive. In contrast to the ProbLog implementation 
presented here, Poole's AILog2, an implementation of ICL, uses a meta-interpreter 
and is not tightly integrated with Prolog. 

We contribute exact and approximate inference algorithms for ProbLog. We 
present algorithms for computing the success and explanation probabilities of a 
query, and show how they can be efficiently implemented combining Prolog infer- 
ence with Binary Decision Diagrams (BDDs) (Bryant 19861. In addition to an itera- 



tive deepening algorithm that computes an approximation along the lines of ( Poole 



1993a), we further adapt the Monte Carlo approach used by (Sevon et al. 2006) 



in the context of biological network inference. These two approximation algorithms 
compute an upper and a lower bound on the success probability. We also contribute 
an additional approximation algorithm that computes a lower bound using only the 
k most likely proofs. 

The key contribution of this paper is the tight integration of these algorithms in 
the state-of-the-art YAP-Prolog system. This integration includes several improve- 
ments over the initial implementation used in ( De Raedt et al. 2007 1 , which are 



needed to use ProbLog to effectively query Sevon's Biomine network (Sevon et al. 



2006) containing about 1,000,000 nodes and 6,000,000 edges, as will be shown in 



the experiments. 

This paper is organised as follows. After introducing ProbLog and its semantics 
in Section 2, we present several algorithms for exact and approximate inference in 
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Section 3. Section 4 then discusses how these algorithms are implemented in YAP- 
Prolog, and Section 5 reports on experiments that validate the approach. Finally, 
Section 6 concludes and touches upon related work. 



2 ProbLog 

A ProbLog program consists of a set of labeled facts pi :: Ci together with a set 
of definite clauses. Each ground instance (that is, each instance not containing 
variables) of such a fact Ci is true with probability pi, that is, these facts correspond 
to random variables. We assume that these variables are mutually independentj^ 
The definite clauses allow one to add arbitrary background knowledge (BK). 

Figure [T] shows a small probabilistic graph that we shall use as running example 
in the text. It can be encoded in ProbLog as follows: 

0. 8 :: edge(a, c). 0. 7 :: edge(a, b). 0. 8 :: edge(c, e). 

0.6 :: edge(b, c). 0. 9 :: edge(c, d). 0. 5 :: edge(e, d). ^' 

Such a probabilistic graph can be used to sample subgraphs by tossing a coin for 
each edge. Given a ProbLog program T = {pi :: ci, • • • , p„ :: c„} U BK and a finite 
set of possible substitutions {9ji, . . . 6ji^} for each probabilistic fact pj :: Cj, let Lt 
denote the maximal set of logical facts that can be added to BK , that is, Lt = 
{ci^ii, . . . , ci^iij, • • • , CnOni, ■ • ■ , c„9ni„}- As the random variables corresponding to 
facts in Lt are mutually independent, the ProbLog program defines a probability 
distribution over ground logic programs L Q Lt'- 

nL\T)^\[ ^il-Pr). (2) 

Since the background knowledge BK is fixed and there is a one-to-one mapping 
between ground definite clause programs and Herbrand interpretations, a ProbLog 
program thus also defines a distribution over its Herbrand interpretations. Sato 
has shown how this semantics can be generalized to the countably infinite case; 



we refer to (Sato 19951 for details. For ease of readability, in the remainder of this 
paper we will restrict ourselves to the finite case and assume all probabilistic facts 
in a ProbLog program to be ground. We extend our example with the following 
background knowledge: 

patli(X,Y) :- edge(X,Y). 
patli(X,Y) :- edge(X, Z),patli(Z,Y). 

We can then ask for the probability that there exists a path between two nodes, 
say c and d, in our probabilistic graph, that is, we query for the probability that a 
randomly sampled subgraph contains the edge from c to d, or the path from c to 
d via e (or both of these). Formally, the success probability Ps{q\T) of a query q 
in a ProbLog program T is the marginal of P{L\ T) with respect to q, i.e. 

Ps{q\T) = Y.^^^ P{q\L) ■ PiL\T) , (4) 



^ If the program contains multiple instances of the same fact, they correspond to different random 
variables, i.e. {p :: c} and {p :: c,p :: c} are different ProbLog programs. 
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Figure 1. Example of a probabilistic graph: edge labels indicate the probability 
that the edge is part of the graph. 

where P{q\L) = 1 if there exists a 9 such that LU BK \= qd, and P{q\L) = 
otherwise. In other words, the success probability of query q is the probability that 
the query q is provable in a randomly sampled logic program. 

In our example, 40 of the 64 possible subprograms allow one to prove path{c, d), 
namely all those that contain at least the edge from c to d or both the edge from 
c to e and from e to d, so the success probability of that query is the sum of the 
probabilities of these programs: Ps{path{c, d)\ T) = P{{ab, ac, be, cd, ce, ed}\ T) + 
. . . + P{{cd^\ T) = 0. 94, where xy is used as a shortcut for edge{x, y) when list- 
ing elements of a subprogram. We will use this convention throughout the paper. 
Clearly, listing all subprograms is infeasible in practice; an alternative approach 
will be discussed in Section 13.11 

A ProbLog program also defines the probability of a specific proof also called 
explanation, of some query q, which is again a marginal of P{L\ T). Here, an expla- 
nation is a minimal subset of the probabilistic facts that together with the back- 
ground knowledge entails qO for some substitution 6. Thus, the probability of such 
an explanation E is that of sampling a logic program LU E that contains at least 
all the probabilistic facts in E, that is, the marginal with respect to these facts: 

The explanation probability Px{q\ T) is then defined as the probability of the most 
likely explanation or proof of the query q 

Px{q\T) = ma.XEeE{q) P{E\T) = maxEeEiq) 

c,eE 

where E{q) is the set of all explanations for query q, i.e., all minimal sets E C Lt 



of probabilistic facts such that E U BK \= q ( Kimmig et al. 2007 ) . 

In our example, the set of all explanations for path{c, d) contains the edge from 
c to d (with probability 0.9) as well as the path consisting of the edges from c to e 
and from e to d (with probability 0. 8 • 0. 5 = 0. 4). Thus, Px{path{c, d)\T) = 0. 9. 

The ProbLog semantics is essentially a distribution semantics fSato 1995). Sato 



has rigorously shown that this class of programs defines a joint probability dis- 
tribution over the set of possible least Herbrand models of the program (allowing 
functors), that is, of the background knowledge BK together with a subprogram 



L C Lt', for further details we refer to (Sato 1995). The distribution semantics has 



been used widely in the literature, though often under other names or in a more 
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Dantsin 1991 


Poole 1993b 


Fuhr 2000 


Poole 2000 


Dalvi 



and Suciu 2004). 



3 Inference in ProbLog 

This section discusses algorithms for computing exactly or approximately the suc- 
cess and explanation probabilities of ProbLog queries. It additionally contributes a 
new algorithm for Monte Carlo approximation of success probabilities. 



3.1 Exact Inference 

Calculating the success probability of a query using Equation Q directly is infea- 
sible for all but the tiniest programs, as the number of subprograms to be checked 
is exponential in the number of probabilistic facts. However, as we have seen in 
our example in Section [2] we can describe all subprograms allowing for a specific 
proof by means of the facts that such a program has to contain, i.e., all the ground 
probabilistic facts used in that proof. As probabilistic facts correspond to random 
variables indicating the presence of facts in a sampled program, we alternatively 
denote proofs by conjunctions of such random variables. In our example, query 
path(c,d) has two proofs in the full program: {edge(c,d)} and {edge(c,e),edge(e,d)}, 
or, using logical notation, cd and ce A ed. The set of all subprograms containing 
some proof thus can be described by a disjunction over all possible proofs, in our 
case, erf V (ce A ed). This idea forms the basis for the inference method presented 



in (De Raedt et al. 20071, which uses two steps: 



1. Compute the proofs of the query q in the logical part of the theory T, that 
is, in BK U Lt- The rcsuh wiU be a DNF formula. 

2. Compute the probability of this formula. 



Similar approaches are used for PRISM (Sato and Kameya 2001 1, ICL (Poole 2000) 



and pD (Fuhr 2000) 



The probability of a single given proof, cf. Equation ([s]), is the marginal over 
all programs allowing for that proof, and thus equals the product of the prob- 
abilities of the facts used by that proof. However, we cannot directly sum the 
results for the different proofs to obtain the success probability, as a specific sub- 
program can allow several proofs and therefore contributes to the probability of 
each of these proofs. Indeed, in our example, all programs that are supersets 
of {edge(c,e) ,edge(e,d) ,edge(c,d)} contribute to the marginals of both proofs and 
would therefore be counted twice if summing the probabilities of the proofs. How- 
ever, for mutually exclusive conjunctions, that is, conjunctions describing disjoint 
sets of subprograms, the probability is the sum of the individual probabilities. This 
situation can be achieved by adding negated random variables to a conjunction, 
thereby explicitly excluding subprograms covered by another part of the formula 
from the corresponding part of the sum. In the example, extending ee A ed to 
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;— edge(c,d). ;- edge (c. A) , path (A, d) . 

I cd ^cd ^ce^^^^ 

^ :- path(d,d). :- path(e,d) 



edge (e, C) , path (C, d) . 




:- edge(d,d). :- edge (d, D) , path (D, d) . 



Figure 2. SLD-tree for query path{c, d). 



ce A ed A^cd reduces the second part of the sum to those programs not covered by 
the first: 

Ps{path{c, d)\T) = P{cdV {ceA ed)\T) 

P{cd\ T) + P(ce A ed A -^cd\ T) 

= 0.9 + 0.8-0.5-(f-0.9) = 0.94 

However, as the number of proofs grows, disjoining them gets more involved. Con- 
sider for example the query path(a,d) which has four different but highly intercon- 
nected proofs. In general, this problem is known as the disjoint-sum-problem or the 



two-terminal network reliability problem, which is #P-complete (Valiant 1979). 

Before returning to possible approaches to tackle the disjoint-sum-problem at the 
end of this section, we will now discuss the two steps of ProbLog's exact inference 
in more detail. 

Following Prolog, the first step employs SLD-resolution to obtain all different 
proofs. As an example, the SLD-tree for the query ?- path{c,d). is depicted in 
Figure [2] Each successful proof in the SLD-tree uses a set of ground probabilistic 
facts {pi :: ci, - ■ ■ ,pk Ck} C T. These facts are necessary for the proof, and the 
proof is independent of other probabilistic facts in T. 

Let us now introduce a Boolean random variable bi for each ground probabilistic 
fact Pi Ci G T, indicating whether Ci is in a sampled logic program, that is, hi 
has probability pi of being truej^ A particular proof of query q involving ground 
facts {pi :: ci,---,pfc :: Ck\ Q T is thus represented by the conjunctive formula 
6i A • • • A 6fc , which at the same time represents the set of all subprograms containing 
these facts. Furthermore, using E{q) to denote the set of proofs or explanations of 
the goal q, the set of all subprograms containing some proof of q can be denoted 



^ For better readability, we do not write substitutions explicitly here. 
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by Vee£;(9) Ac ee ^^"^ following derivation shows: 

V A = V ( A A (b^^-h) 

eeE(q) c,Ge eeE{q) \c,ee c,eLr\e 

= V V (a^^^ia^^^ a 

eeE{q) LCLrXe \c,ee yc.eL c,eLT\(-LUe) 

V ( A A 

eeE{q),LCLT\e \c,eLU e CieLrMLUe) 

V (a^»^ a 

We first add all possible ways of extending a proof e to a full sampled program by 
considering each fact not in e in turn. We then note that the disjunction of these 
fact-wise extensions can be written on the basis of sets. Finally, we rewrite the 
condition of the disjunction in the terms of Equation Q. This is possible as each 
subprogram that is an extension of an explanation of q entails some ground instance 
of q, and vice versa, each subprogram entailing q is an extension of some explanation 
of q. As the DNF now contains conjunctions representing fully specified programs, 
its probability is a sum of products, which directly corresponds to Equation Q: 

Pi V f A A -^»)) 

Lc:LT,^eLu BK\=qe \c,eL ceLtXL / 

E fn^- n (1-^0 

LCLT,3eL U BK\=qe 

We thus obtain the following alternative characterisation of the success probability: 

Ps{q\T) = pi V A A (7) 

\eeE{q) c.ee J 

where E{q) denotes the set of proofs or explanations of the goal q and hi denotes 
the Boolean variable corresponding to ground probabilistic fact Pi :: c^. Thus, the 
problem of computing the success probability of a ProbLog query can be reduced 
to that of computing the probability of a DNF formula. 

However, as argued above, due to overlap between different conjunctions, the 
proof-based DNF of Equation ([t]) cannot directly be transformed into a sum of 
products. Computing the probability of DNF formulae thus involves solving the 
disjoint-sum-problem, and therefore is itself a #P-hard problem. Various algorithms 



have been developed to tackle this problem. The pD-engine HySpirit (Fuhr 2000) 
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uses the inclusion-exclusion principle, which is reported to scale to about ten proofs. 
For ICL, which extends PHA by allowing non-disjoint proofs, (Poole 2000) pro- 
poses a symbolic disjoining algorithm, but does not report scalability results. Our 
implementation of ProbLog employs Binary Decision Diagrams (BDDs) (Bryant 



1986), an efficient graphical representation of a Boolean function over a set of vari- 



ables, which scales to tens of thousands of proofs; see Section [4~4| for more details. 
PRISM ( [Sato and Kameya 200l] ) and PHA ( [Poole 1993bl ) differ from the systems 
mentioned above in that they avoid the disjoint-sum-problem by requiring the user 
to write programs such that proofs are guaranteed to be disjoint. 

On the other hand, as the explanation probability exclusively depends on the 
probabilistic facts used in one proof, it can be calculated using a simple branch- 
and-bound approach based on the SLD-tree, where partial proofs are discarded if 
their probability drops below that of the best proof found so far. 



3.2 Approximative Inference 

As the size of the DNF formula grows with the number of proofs, its evaluation can 
become quite expensive, and ultimately infeasible. For instance, when searching for 
paths in graphs or networks, even in small networks with a few dozen edges there 
are easily O(IO^) possible paths between two nodes. ProbLog therefore includes 
several approximation methods. 



3.2.1 Bounded Approximation 



The first approximation algorithm, a slight variant of the one proposed in (De Raedt 



et al. 20071, uses DNF formulae to obtain both an upper and a lower bound on the 



probability of a query. It is closely related to work by (Poole 1993a) in the context 
of PHA, but adapted towards ProbLog. The method relies on two observations. 

First, we remark that the DNF formula describing sets of proofs is monotone, 
meaning that adding more proofs will never decrease the probability of the formula 
being true. Thus, formulae describing subsets of the full set of proofs of a query 
will always give a lower bound on the query's success probability. In our example, 
the lower bound obtained from the shorter proof would be P{cd\T) — 0.9, while 
that from the longer one would be P{ce A ed\ T) — 0. 4. 

Our second observation is that the probability of a proof &i A . . . A 6„ will always 
be at most the probability of an arbitrary prefix 6i A . . . A fei, « < n. In our example, 
the probability of the second proof will be at most the probability of its first edge 
from c to e, i.e., P{ce\T) = 0.8 > 0.4. As disjoining sets of proofs, i.e., including 
information on facts that are not elements of the subprograms described by a certain 
proof, can only decrease the contribution of single proofs, this upper bound carries 
over to a set of proofs or partial proofs, as long as prefixes for all possible proofs are 
included. Such sets can be obtained from an incomplete SLD-tree, i.e., an SLD-tree 
where branches are only extended up to a certain point. 

This motivates ProbLog's bounded approximation algorithm. The algorithm re- 
lies on a probability threshold 7 to stop growing the SLD-tree and thus obtain 
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Algorithm 1 Bounded approximation using iterative deepening with probability 

thresholds. 

function BouNDS(interval width dp, initial threshold 7, constant [3 E (0, 1)) 
di = False; da ^ False; P{di\T) = 0; ^(^2!^) = 1; 
while Pid2\T) ~ P{di\T) > Sp do 
p =True; 
repeat 

Expand current proof p 
until either p: 

(a) Fails, in this case backtrack to the remaining choice points; 

(b) Succeeds, in this case set di :— diV p and ^2 := c?2 V p; 

(c) P{p\T) < 7, in this case set d2 := ^2 V p 
if d2 =— False then 

set d2 = True 
Compute P{di\T) and P{d2\T) 
7 := 7 • /? 
return [P{di\T), P(d2\T)] 



DNF formulae for the two boundf]^ The lower bound formula di represents all 
proofs with a probability above the current threshold. The upper bound formula 
^2 additionally includes all derivations that have been stopped due to reaching the 
threshold, as these still may succeed. Our goal is therefore to grow di and d2 in 
order to decrease P{d2\T) - P{di\T). 

Given an acceptance threshold Sp, an initial probability threshold 7, and a shrink- 
ing factor P G (0; l)j the algorithm proceeds in an iterative-deepening manner as 
outlined in Algorithm [T] Initially, both di and c?2 are set to False, the neutral 
element with respect to disjunction, and the probability bounds are and 1, as we 
have no full proofs yet, and the empty partial proof holds in any model. 

It should be clear that P{di\T) monotonically increases, as the number of proofs 
never decreases. On the other hand, as explained above, if ^2 changes from one 
iteration to the next, this is always because a partial proof p is either removed from 
d2 and therefore no longer contributes to the probability, or it is replaced by proofs 
Pi, . . . ,Pn, such that Pi — pAsi, hence P{piV. . .Vp„| T) — P{pAsiV. . .VpAs„| T) = 
P(pA(siV. . .Vs„)| T). As proofs are subsets of the probabilistic facts in the ProbLog 
program, each literal's random variable appears at most once in the conjunction 
representing a proof, even if the corresponding subgoal is called multiple times 
when constructing the proof. We therefore know that the literals in the prefix 
p cannot be in any suffix Si, hence, given ProbLog's independence assumption, 
P(p A (si V ... V Sn)\T) = P{p\T)P{si V ... V s„|T) < P{p\T). Therefore, ^(^2) 
monotonically decreases. 

As an illustration, consider a probability threshold 7 = 0. 9 for the SLD-tree in 



^ Using a probability threshold instead of the depth bound of fDc Raedt et al. 2007 1 has been 
found to speed up convergence, as upper bounds have been found to be tighter on initial levels. 
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Figure[2j In this case, di encodes the left success path while d2 additionally encodes 
the path up to path{e, d), i.e., di = cd and d2 — cd\/ ce, whereas the formula for the 
full SLD-tree is d = cd V (ce A ed). The lower bound thus is 0. 9, the upper bound 
(obtained by disjoining ^2 to cd V (ce A ^cd)) is 0. 98, whereas the true probability 
is 0. 94. 

Notice that in order to implement this algorithm wc need to compute the prob- 
ability of a set of proofs. This task will be described in detail in Section |4j 



3.2.2 K-Best 

Using a fixed number of proofs to approximate the probability allows better control 
of the overall complexity, which is crucial if large numbers of queries have to be eval- 



uated, e.g., in the context of parameter learning. ( |Gutmann et al. 2008 ) therefore 



introduces the fc-probability Pk{q\T), which approximates the success probability 
by using the fc-best (that is, the k most likely) explanations instead of all proofs 
when building the DNF formula used in Equation ([7|: 

Pk{q\T) = pi V f\ bA (8) 

\eeB(:(g) bievar{e) j 

where Ek{q) = {e e E{q)\Px{e) > Px{ek)} with Ck the fcth element of E{q) 
sorted by non-increasing probability. Setting k — oo leads to the success probability, 
whereas k = \ corresponds to the explanation probability provided that there is a 
single best proof. The branch-and-bound approach used to calculate the explanation 



probability can directly be generalized to finding the fc-best proofs; cf. also (Poole 



1993b) 



To illustrate fc-probability, we consider again our example graph, but this time 
with query path(a, d). This query has four proofs, represented by the conjunctions 
acA cd, abAbcA cd, acAccA ed and ab Abe Ace A ed, with probabilities 0. 72, 0. 378, 
0. 32 and 0. 168 respectively. As Pi corresponds to the explanation probability P^, 
we obtain Pi{path{a, d)) = 0. 72. For fc = 2, the overlap between the best two proofs 
has to be taken into account: the second proof only adds information if the first one 
is absent. As they share edge cd, this means that edge ac has to be missing, leading 
to P2{path{a, d)) = P{{ac A cd) V (^ac AabAbcA cd)) = 0. 72 (1 - 0. 8) • 0. 378 = 
0. 7956. Similarly, we obtain Pz{path{a, d)) = 0. 8276 and Pk{path{a, d)) = 0. 83096 
for fc > 4. 



3.2.3 Monte Carlo 

As an alternative approximation technique, we propose a Monte Carlo method, 
where we proceed as follows. 
Execute until convergence: 

1. Sample a logic program from the ProbLog program 

2. Check for the existence of some proof of the query of interest 
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3. Estimate the query probability P as the fraction of samples where the query 
is provable 

We estimate convergence by computing the 95% confidence interval at each m 
samples. Given a large number of samples, we can use the standard normal 
approximation interval to the binomial distribution: 



N 

Notice that confidence intervals do not directly correspond to the exact bounds used 
in our previous approximation algorithm. Still, we employ the same stopping crite- 
rion, that is, we run the Monte Carlo simulation until the width of the confidence 
interval is at most dp. 

A similar algorithm (without the use of confidence intervals) was also used in 



the context of biological networks (not represented as Prolog programs) by ( Sevon 



et al. 2006). The use of a Monte Carlo method for probabilistic logic programs 



was suggested already by (Dantsin 1991), although he neither provides details nor 



reports on an implementation. Our approach differs from the MCMC method for 



Stochastic Logic Programs (SLPs) introduced by ( |Cussens 2000 ) in that we do 



not use a Markov chain, but restart from scratch for each sample. Furthermore, 
SLPs are different in that they directly define a distribution over all proofs of a 
query. Investigating similar probabilistic backtracking approaches for ProbLog is a 
promising future research direction. 



4 Implementation 

This section discusses the main building blocks used to implement ProbLog on 
top of the YAP-Prolog system. An overview is shown in Figure [3j with a typical 
ProbLog program, including ProbLog facts and background knowledge (BK), at 
the top. 

The implementation requires ProbLog programs to use the problog module. Each 
program consists of a set of labeled facts and of unlabeled background knowledge, a 
generic Prolog program. Labeled facts are preprocessed as described below. Notice 
that the implementation requires all queries to non-ground probabilistic facts to be 
ground on calling. 

In contrast to standard Prolog queries, where one is interested in answer substi- 
tutions, in ProbLog one is primarily interested in a probability. As discussed before, 
two common ProbLog queries ask for the most likely explanation and its probabil- 
ity, and the probability of whether a query would have an answer substitution. We 
have discussed two very different approaches to the problem: 

• In exact inference, fc-best and bounded approximation, the engine explicitly 
reasons about probabilities of proofs. The challenge is how to compute the 
probability of each individual proof, store a large number of proofs, and com- 
pute the probability of sets of proofs. 
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Figure 3. ProbLog Implementation: A ProbLog program (top) requires the ProbLog 
library which in turn relies on functionality from the tries and array libraries. 
ProbLog queries (bottom-left) arc sent to the YAP engine, and may require calling 
the BDD library CUDD via SimpleCUDD. 

• In Monte Carlo, the probabilities of facts are used to sample from ProbLog 
programs. The challenge is how to compute a sample quickly, in a way that 

inference can be as efficient as possible. 

ProbLog programs execute from a top-level query and are driven through a ProbLog 
query. The inference algorithms discussed above can be abstracted as follows: 

• Initialise the inference algorithm; 

• While probabilistic inference did not converge: 

— initialise a new query; 

— execute the query, instrumenting every ProbLog call in the current proof. 
Instrumentation is required for recording the ProbLog facts required by 
a proof, but may also be used by the inference algorithm to stop proofs 
(e.g., if the current probability is lower than a bound); 

— process success or exit substitution; 

• Proceed to the next step of the algorithm: this may be trivial or may require 
calling an external solver, such as a BDD tool, to compute a probability. 

Notice that the current ProbLog implementation relies on the Prolog engine to 

efficiently execute goals. On the other hand, and in contrast to most other proba- 
bilistic language implementations, in ProbLog there is no clear separation between 
logical and probabilistic inference: in a fashion similar to constraint logic program- 
ming, probabilistic inference can drive logical inference. 

From a Prolog implementation perspective, ProbLog poses a number of interest- 
ing challenges. First, labeled facts have to be efBciently compiled to allow mutual 
calls between the Prolog program and the ProbLog engine. Second, for exact in- 
ference, fc-best and bounded approximation, sets of proofs have to be manipulated 
and transformed into BDDs. Finally, Monte Carlo simulation requires representing 
and manipulating samples. We discuss these issues next. 
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4.1 Source-to- source transformation 

We use the term_exp£m.sion mechanism to aUow Prolog caUs to labeled facts, and 
for labeled facts to call the ProbLog engine. As an example, the program: 



0.715 :: edge('PubMed_2196878',' MIM_609065'). 
0.659 :: edge('PubMed_8764571',' HGNC_5014'). 

would be compiled as: 

edge(A,B) :- problog_edge(lD, A, B, LogProb), 

grounding_id(edge(A, B), ID, GroundID), 
add_to_proof (GroundID, LogProb). 

problog_edge(0,' PubMed_2196878',' MIM_609065', -0. 3348). 
problog_edge(l,' PubMed_8764571',' HGNC_5014', -0. 4166). 



(9) 



(10) 



Thus, the internal representation of each fact contains an identifier, the original 
arguments, and the logarithm of the probabilitj]^ The grounding_id procedure 
will create and store a grounding specific identifier for each new grounding of a 
non-ground probabilistic fact encountered during proving, and retrieve it on re- 
peated use. For ground probabilistic facts, it simply returns the identifier itself. The 
add_to_proof procedure updates the data structure representing the current path 
through the search space, i.e., a queue of identifiers ordered by first use, together 
with its probability. Compared to the original meta-interpreter based implementa- 



tion of (De Raedt et al. 2007), the main benefit of source-to-source transformation 



is better scalability, namely by having a compact representation of the facts for 



the YAP engine (Santos Costa 2007) and by allowing access to the YAP indexing 



mechanism ( Santos Costa et al. 2007 1 



4-2 Proof Manipulation 

Manipulating proofs is critical in ProbLog. We represent each proof as a queue con- 
taining the identifier of each different ground probabilistic fact used in the proof, 
ordered by first use. The implementation requires calls to non-ground probabilistic 
facts to be ground, and during proving maintains a table of groundings used within 
the current query together with their identifiers. Grounding identifiers are based 
on the fact's identifier extended with a grounding number, i.e. 5_1 and 5_2 would 
refer to different groundings of the non-ground fact with identifier 5. In our imple- 
mentation, the queue is stored in a backtrackable global variable, which is updated 
by calling add_to_proof with an identifier for the current ProbLog fact. We thus 
exploit Prolog's backtracking mechanism to avoid recomputation of shared proof 
prefixes when exploring the space of proofs. Storing a proof is simply a question of 
adding the value of the variable to a store. 

As we have discussed above, the actual number of proofs can grow very quickly. 

* We use the logarithm to avoid numerical problems when calculating the probability of a deriva- 
tion, which is used to drive inference. 
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ProbLog compactly represents a proof as a list of numbers. We would further like 
to have a scalable implementation of sets of proofs, such that we can compute the 
joint probability of large sets of proofs efficiently. Our representation for sets of 
proofs and our algorithm for computing the probability of such a set are discussed 
next. 



4.3 Sets of Proofs 

Storing and manipulating proofs is critical in ProbLog. When manipulating proofs, 
the key operation is often insertion: we would like to add a proof to an existing set of 
proofs. Some algorithms, such as exact inference or Monte Carlo, only manipulate 
complete proofs. Others, such as bounded approximation, require adding partial 
derivations too. The nature of the SLD-tree means that proofs tend to share both 
a prefix and a suffix. Partial proofs tend to share prefixes only. This suggests using 
tries to maintain the set of proofs. We use the YAP implementation of tries for 



this task, based itself on XSB Prolog's work on tries of terms ( |Ramakrishnan et al. 



1999), which we briefly summarize here. 



Tries (Fredkin 1962) were originally invented to index dictionaries, and have 



since been generalised to index recursive data structures such as terms. Please refer 



to ( Bachmair et al. 1993 Graf 1996 Ramakrishnan et al. 1999 ) for the use of tries in 



automated theorem proving, term rewriting and tabled logic programs. An essential 
property of the trie data structure is that common prefixes are stored only once. 
A trie is a tree structure where each different path through the trie data units, 
the trie nodes, corresponds to a term described by the tokens labelling the nodes 
traversed. For example, the tokenized form of the term /(<?(«), 1) is the sequence of 
4 tokens: //2, g/1, a and 1. Two terms with common prefixes will branch off from 
each other at the first distinguishing token. 

Trie's internal nodes are four field data structures, storing the node's token, 
a pointer to the node's first child, a pointer to the node's parent and a pointer 
to the node's next sibling, respectively. Each internal node's outgoing transitions 
may be determined by following the child pointer to the first child node and, from 
there, continuing sequentially through the list of sibling pointers. When a list of 
sibling nodes becomes larger than a threshold value (8 in our implementation), we 
dynamically index the nodes through a hash table to provide direct node access and 
therefore optimise the search. Further hash collisions are reduced by dynamically 
expanding the hash tables. Inserting a term requires in the worst case allocating 
as many nodes as necessary to represent its complete path. On the other hand, 
inserting repeated terms requires traversing the trie structure until reaching the 
corresponding leaf node, without allocating any new node. 

In order to minimize the number of nodes when storing proofs in a trie, we use 
Prolog lists to represent proofs. For example, a ProbLog proof [3, 5_1, 7, 5_2] uses 
ground fact 3, a first grounding of fact 5, ground fact 7 and another grounding of 
fact 5, that is, list elements in proofs are always either integers or two integers with 
an underscore in between. 

Figure [4] presents an example of a trie storing three proofs. Initially, the trie 
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(a) 



(b) 



(c) 
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^ END_LIST ^ 



Figure 4. Using tries to store proofs. Initially, the trie contains the root node only. 
Next, we store the proofs: (a) [3, 5_1, 7, 5_2]; (b) [3, 5_1, 9, 7, 5_2]; and (c) [3,4,7]. 



0.2 








0.5, - ' ' ' 








0.5 






Figure 5. Binary Decision Diagram encoding the DNF formula cd V (ce A ed), cor- 
responding to the two proofs of query path(c,d) in the example graph. An internal 
node labeled xy represents the Boolean variable for the edge between x and y, 
solid/dashed edges correspond to values true/false and are labeled with the prob- 
ability that the variable takes this value. 

contains the root node only. Next, we store the proof [3, 5_1, 7, 5_2] and six nodes 
(corresponding to six tokens) are added to represent it (Figure iga)). The proof 
[3, 5_1, 9, 7, 5_2] is then stored which requires seven nodes. As it shares a common 
prefix with the previous proof, we save the three initial nodes common to both 
representations (Figure ^h)). The proof [3,4,7] is stored next and we save again 
the two initial nodes common to all proofs (Figure |4|^c)). 



4-4 Binary Decision Diagrams 

To efficiently compute the probability of a DNF formula representing a set of proofs, 
our implementation represents this formula as a reduced ordered Binary Decision 



Diagram (BDD) (Bryant 1986), which can be viewed as a compact encoding of a 
Boolean decision tree. Given a fixed variable ordering, a Boolean function / can 
be represented as a full Boolean decision tree, where each node on the ith level 
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Algorithm 2 Translating a trie T representing a DNF to a BDD generation script. 
Replace(T, C,ni) replaces each occurence of C in T by rii. 
function TRANSLATE(trie T) 
i := 1 

while -^leaf{T) do 

S/x := {{C , P)\C leaf in T and single child of its parent P} 
for all (C,P) e do 
write rii — P A C 
T := REPLACE(T,(C,P),n,) 
i := i + l 

S\y := {[Ci, . . . , Cn]\ leaves Cj are all the children of some parent P in T} 
for all [Ci,...,C„] e 5v do 
write Tii = Ci V . . . V C„ 
T:=REPLACE(r,[(7i,...,C„],n,) 
i := i + l 
write top = 



is labeled with the ith variable and has two children called low and high. Leaves 
are labeled by the outcome of / for the variable assignment corresponding to the 
path to the leaf, where in each node labeled x, the branch to the low (high) child is 
taken if variable x is assigned (1). Starting from such a tree, one obtains a BDD 
by merging isomorphic subgraphs and deleting redundant nodes until no further 
reduction is possible. A node is redundant if the subgraphs rooted at its children 
are isomorphic. Figure [5] shows the BDD for the existence of a path between c and 
d in our earlier example. 

We use SimpleCUDEj^ as a wrapper tool for the BDD package CUDE[^ to con- 
struct and evaluate BDDs. More precisely, the trie representation of the DNF is 
translated to a BDD generation script, which is processed by SimpleCUDD to build 
the BDD using CUDD primitives. It is executed via Prolog's shell utility, and results 
are reported via shared files. 

During the generation of the code, it is crucial to exploit the structure sharing 
(prefixes and suffixes) already in the trie representation of a DNF formula, otherwise 
CUDD computation time becomes extremely long or memory overflows quickly. 
Since CUDD builds BDDs by joining smaller BDDs using logical operations, the trie 
is traversed bottom-up to successively generate code for all its subtrees. Algorithm[2] 
gives the details of this procedure. Two types of operations are used to combine 
nodes. The first creates conjunctions of leaf nodes and their parent if the leaf is 
a single child, the second creates disjunctions of all child nodes of a node if these 
child nodes are all leaves. In both subtree that occurs multiple times in the 

trie is translated only once, and the resulting BDD is used for all occurrences of 
that subtree. Because of the optimizations in CUDD, the resulting BDD can have 



http: //vrww. cs . kuleuveii.be/-theo/tools/simplecudd.html 
http : //vlsi . Colorado . edu/~f abio/CUDD 
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(a) (b) (c) (d) (e) 



Figure 6. Translating the DNF for path(a,d). 



Algorithm 3 Calculating the probability of a BDD. 
function Probability(BDD node n ) 
If n is the 1-terminal then return 1 
If n is the 0-terminal then return 
let h and / be the high and low children of n 
proh{h) := call Probability(/i) 
prob{l) := cah Probability(/) 
return p„ • prob{h) + (1 — p„) • prob(l) 



a very different structure than the trie. The translation for query path(a,d) in our 
example graph is illustrated in Figure |6j it results in the following script: 



nl = 


ce A ed 


n2 = 


cd V nl 


7l3 = 


ac A n2 


n4 = 


be A n2 


n5 — 


ab A n4 


n6 = 


n3 V n5 


top = 


n6 



After CUDD has generated the BDD, the probability of a formula is calculated 
by traversing the BDD, in each node summing the probability of the high and low 
child, weighted by the probability of the node's variable being assigned true and false 
respectively, cf. AlgorithmjS] Intermediate results are cached, and the algorithm has 
a time and space complexity linear in the size of the BDD. For illustration, consider 
again Figure [5) The algorithm starts by assigning probabilities and 1 to the 0- and 
1-leaf respectively. The node labeled ed has probability 0. 5 • 1 + 0. 5 • = 0. 5, node 
ce has probability 0. 8-0. 5 + 0. 2-0 = 0.4; finally, node cd, and thus the entire 
formula, has probability 0. 9 • 1 + 0.1 • 0. 4 = 0. 94. 
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Algorithm 4 Monte Carlo Inference. 

function MonteCarlo (query q, interval width 5p, constant m) 

c^O; i = 0; p = 0; S = 1; 
while S > Sp do 

Generate a sample P'; 

if P' \= q then 

c := c + 1; 
j i + 1; 

if i mod m == then 

p := c/i 

return p 



4-5 Monte Carlo 

The Monte Carlo implementation is shown in Algorithm [4] It receives a query q, 
an acceptance threshold Sp and a constant m determining the number of samples 
generated per iteration. At the end of each iteration, it estimates the probability 
p as the fraction of programs sampled over all previous iterations that entailed 
the query, and the confidence interval width to be used in the stopping criterion 



as explained in Section 3.2.3 Monte Carlo execution is quite different from the 
approaches discussed before, as the two main steps are (a) generating a sample 
program and (b) performing standard refutation on the sample. Thus, instead of 
combining large numbers of proofs, we need to manipulate large numbers of different 
programs or samples. 

Our first approach was to generate a complete sample and to check for a proof. 
In order to accelerate the process, proofs were cached in a trie to skip inference on 
a new sample. If no proofs exist on a cache, we call the standard Prolog refutation 
procedure. Although this approach works rather well for small databases, it does 
not scale to larger databases where just generating a new sample requires walking 
through millions of facts. 

We observed that even in large programs proofs are often quite short, i.e., we 
only need to verify whether facts from a small fragment of the database are in 
the sample. This suggests that it may be a good idea to take advantage of the 
independence between facts and generate the sample lazily: we verify whether a 
fact is in the sample only when we need it for a proof. YAP represents samples 
compactly as a three- valued array with one field for each fact, where means the 
fact was not yet sampled, 1 it was already sampled and belongs to the sample, 2 it 
was already sampled and does not belong to the sample. In this implementation: 

1. New samples are generated by resetting the sampling array. 

2. At every call to add_to_proof , given the current ProbLog literal /: 

(a) ifs[/] == 0, s[f] ^ sampleif); 

(b) if s[f] == 1, succeed; 

(c) iis[f] ==2, fail; 
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Note that as fact identifiers are used to access the array, the approach cannot di- 
rectly be used for non-ground facts. The current implementation of Monte Carlo 
therefore uses the internal database to store the result of sampling different ground- 
ings of such facts. 



5 Experiments 

We performed experiments with our implementation of ProbLog in the context of 



the biological network obtained from the Biomine project (Sevon et al. 2006). We 
used two subgraphs extracted around three genes known to be connected to the 
Alzheimer disease (HGNC numbers 983, 620 and 582) as well as the full network. 
The smaller graphs were obtained querying Biomine for best paths of length 2 
(resulting in graph Small) or all paths of length 3 (resulting in graph Medium) 
starting at one of the three genes. Small contains 79 nodes and 144 edges. Medium 
5220 nodes and 11532 edges. We used Small for a first comparison of our algorithms 
on a small scale network where success probabilities can be calculated exactly. 
Scalability was evaluated using both Medium and the entire Biomine network 
with roughly 1,000,000 nodes and 6,000,000 edges. In all experiments, we queried 
for the probability that two of the gene nodes mentioned above are connected, 
that is, wc used queries such as path('HGNC_983' , 'HGNC_620' ,Path). We used 
the following definition of an acyclic path in our background knowledge: 



patli(X, Y, A) 
patli(X,X, A, A). 
patli(X, Y, A, R) 



- path(X,Y,[X],A), 

- X\== Y, 
edge(X, Z), 
absent(Z, A), 
path(Z,Y, [Z|A],R). 



As list operations to check for the absence of a node get expensive for long paths, 
we consider an alternative definition for use in Monte Carlo. It provides cheaper 
testing by using the internal database of YAP to store nodes on the current path 
under key visited: 



meinopath(X, Y, A) :— eraseall(visited), 

memopath(X, Y, [X], A). 

meinopath(X, X, A, A). 

memopath(X, Y, A, R) :- X \ == Y, (12) 

edge(X,Z), 

recordzif not(visited, Z, _), 
meinopath(Z, Y, [Z|A], R). 
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exact I 670 450 0.11 1 8060 659 0.11 1 630 721 1.00 



Table 1. fc-probability on Small. 



Finally, to assess performance on the full network for queries with smaller proba- 
bilities, we use the following definition of paths with limited length: 



lenpath(N, X, Y, Path) 
lenpath(N, X, X, A, A) 
lenpath(N, X, Y, A, P) 



lenpath(N, X, Y, [X] , Path) . 

N > 0. 

X\ == Y, 

N > 0, 

edge(X, Z), 

absent (Z, A), 

NN is N - 1, 

lenpath(NN, Z, Y, [Z|A], P). 



(13) 



All experiments were performed on a Core 2 Duo 2.4 GHz 4 GB machine running 
Linux. All times reported are in msec and do not include the time to load the graph 
into Prolog. The latter takes 20, 200 and 78140 msec for Small, Medium and 
BlOMlNE respectively. Furthermore, as YAP indexes the database at query time, 
we query for the explanation probability of path('HGNC_620' , 'HGNC_582' .Path) 
before starting runtime measurements. This takes 0, 50 and 25900 msec for Small, 
Medium and Biomine respectively. We report Tp, the time spent by ProbLog to 
search for proofs, as well as Tb, the time spent to execute BDD programs (whenever 
meaningful). We also report the estimated probability P. For approximate inference 
using bounds, we report exact intervals for P, and also include the number n of 
BDDs constructed. We set both the initial threshold and the shrinking factor to 
0. 5. We computed fc-probability for fc = 1, 2, . . . , 1024. In the bounding algorithms, 
the error interval ranged between 10% and 1%. Monte Carlo recalculates confidence 
intervals after m = 1000 samples. We also report the number S of samples used. 
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path 983 - 620 

5 Tp Tb n P 



983 - 582 
Tp Tb n P 



620 - 582 
Tp Tb n P 



0.10 48 4 [0.07,0.12] 
0.05 71 6 [0.07,0.11] 
0.01 83 7 [0.11,0.11] 



10 74 6 [0.06,0.11] 
75 6 [0.06,0.11] 
140 3364 10 [0.10,0.11] 



25 2 [0.91,1.00] 
486 4 [0.98,1.00] 
60 1886 6 [1.00,1.00] 



Table 2. Inference using bounds on Small. 
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Table 3. Monte Carlo Inference on Small. 



Small Sized Sample We first compared our algorithms on Small. Table [T] shows the 
results for fc-probability and exact inference. Note that nodes 620 and 582 are close 
to each other, whereas node 983 is farther apart. Therefore, connections involving 
the latter are less likely. In this graph, we obtained good approximations using a 
small fraction of proofs (the queries have 13136, 155695 and 16048 proofs respec- 
tively). Our results also show a significant increase in running times as ProbLog 
explores more paths in the graph, both within the Prolog code and within the 
BDD code. The BDD running times can vary widely, we may actually have large 
running times for smaller BDDs, depending on BDD structure. However, using 
SimpleCUDD instead of the C-|— I- interface used in (Kimmig et al. 20081 typically 
decreases BDD time by at least one or two orders of magnitude. 

Table [2] gives corresponding results for bounded approximation. The algorithm 
converges quickly, as few proofs are needed and BDDs remain small. Note however 
that exact inference is competitive for this problem size. Moreover, we observe 



large speedups compared to the implementation with meta- interpreters used in (De 



Raedt et al. 2007), where total runtimes to reach (5 — 0.01 for these queries were 



46234, 206400 and 307966 msec respectively. Table [3] shows the performance of the 
Monte Carlo estimator. On Small, Monte Carlo is the fastest approach. Already 
within the first 1000 samples a good approximation is obtained. 

The experiments on Small thus confirm that the implementation on top of YAP- 
Prolog enables efficient probabilistic inference on small sized graphs. 



Medium Sized Sample For graph Medium with around 11000 edges, exact inference 
is no longer feasible. Table [4] again shows results for the fc-probability. Comparing 
these results with the corresponding values from Table [1} we observe that the es- 
timated probability is higher now: this is natural, as the graph has both more 
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Table 4. fc-probability on Medium. 
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0.10 
0.05 
0.01 


1000 1180 0.78 
2000 2320 0.77 
29000 33220 0.77 


1000 2130 0.76 
2000 4230 0.74 
29000 61140 0.77 


1000 1640 1.00 
1000 1640 1.00 
1000 1670 1.00 





Table 5. Monte Carlo Inference using memopatli/3 on Medium. 

nodes and is more connected, therefore leading to many more possible explana- 
tions. This also explains the increase in running times. Approximate inference using 
bounds only reached loose bounds (with differences > 0. 2) on queries involving node 
'HGNC_983' , as upper bound formulae with more than 10 million conjunctions were 
encountered, which could not be processed. 

The Monte Carlo estimator using the standard definition of path/3 on Medium 
did not complete the first 1000 samples within one hour. A detailed analysis shows 
that this is caused by some queries backtracking too heavily. Table [5] therefore 
reports results using the memorising version meniopath/3. With this improved defi- 
nition, Monte Carlo performs well: it obtains a good approximation in a few seconds. 
Requiring tighter bounds however can increase runtimes significantly. 

Biomine Database The Biomine Database covers hundreds of thousands of enti- 
ties and millions of links. On Biomine, we therefore restricted our experiments to 
the approximations given by fc-probability and Monte Carlo. Given the results on 
Medium, we directly used memopath/3 for Monte Carlo. Tables |6] and [7] show the 
results on the large network. We observe that on this large graph, the number of 
possible paths is tremendous, which implies success probabilities practically equal 
to 1. Still, we observe that ProbLog's branch-and-bound search to find the best 
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512 


242,210 


730 


0.98 


501,870 


2,744 


0.88 


23,910 


3,444 


1.00 


1024 


364,490 


10,597 


0.99 


1,809,680 


100,468 


0.93 


146,890 


10,675 
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Table 6. fc-probability on Biomine. 



memo 983 - 620 

S S Tp P 



983 - 582 
S Tp P 



620 - 582 
S Tp P 



0.10 1000 100,700 1.00 
0.05 1000 100,230 1.00 
0.01 1000 93,120 1.00 



1000 1,656,660 1.00 
1000 1,671,880 1.00 
1000 1,710,200 1.00 



1000 1,696,420 1.00 
1000 1,690,830 1.00 
1000 1,637,320 1.00 



Table 7. Monte Carlo Inference using memopatli/3 on Biomine. 



solutions performs reasonably also on this size of network. However, runtimes for 
obtaining tight confidence intervals with Monte Carlo explode quickly even with 
the improved path definition. Given that sampling a program that does not entail 
the query is extremely unlikely for the setting considered so far, we performed an 
additional experiment on Biomine, where we restrict the number of edges on the 
path connecting two nodes to a maximum of 2 or 3. Results are reported in Ta- 
ble [8j As none of the resulting queries have more than 50 proofs, exact inference is 
much faster than Monte Carlo, which needs a higher number of samples to reliably 
estimate probabilities that are not close to 1. 

Altogether, the experiments confirm that our implementation provides efficient 
inference algorithms for ProbLog that scale to large databases. Furthermore, com- 
pared to the original implementation of (De Raedt et al. 2007), we obtain large 
speedups in both the Prolog and the BDD part, thereby opening new perspectives 
for applications of ProbLog. 



6 Conclusions 



ProbLog is a simple but elegant probabilistic logic programming language that al- 
lows one to explicitly represent uncertainty by means of probabilistic facts denoting 
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len 

5 



983 - 620 

T P 



983 - 582 
S T 



620 - 582 
S T 



0.10 
0.05 
0.01 



1000 
1000 
6000 



21,400 0.04 
19,770 0.05 
112,740 0.04 



1000 
1000 
16000 



18,720 0.11 
20,980 0.10 
307,520 0.11 



1000 
2000 
40000 



19,150 0.58 
35,100 0.55 
764,700 0.55 



exact 



477 0.04 



456 0.11 



581 0.55 



0.10 1000 106,730 0.14 
0.05 1000 107,920 0.14 
0.01 19000 2,065,030 0.14 



1000 
2000 
37000 



105,350 0.33 
198,930 0.34 
3,828,520 0.35 



1000 45,400 0.96 
1000 49,950 0.96 
6000 282,400 0.96 



exact 



9,413 0.14 



9,485 0.35 



15,806 0.96 



Table 8. Monte Carlo inference for different values of S and exact inference using 
lenpath/4 with length at most 2 (top) or 3 (bottom) on BiOMiNE. For exact 
inference, runtimes include both Prolog and BDD time. 



independent random variables. The language is a simple and natural extension of 
the logic programming language Prolog. We presented an efficient implementation 
of the ProbLog language on top of the YAP-Prolog system that is designed to scale 
to large sized problems. We showed that ProbLog can be used to obtain both expla- 
nation and (approximations of) success probabilities for queries on a large database. 
To the best of our knowledge, ProbLog is the first example of a probabilistic logic 
programming system that can execute queries on such large databases. Due to the 
use of BDDs for addressing the disjoint-sum-problem, the initial implementation of 
ProbLog used in (De Raedt et al. 2007) already scaled up much better than alter- 
native implementations such as Fuhr's pD engine HySpirit (Fuhr 2000). The tight 
integration in YAP-Prolog presented here leads to further speedups in runtime of 
several orders of magnitude. 

Although we focused on connectivity queries and Biomine in this work, similar 
problems are found across many domains; we believe that the techniques presented 
apply to a wide variety of queries and databases because ProbLog provides a clean 
separation between background knowledge and what is specific to the engine. As 
shown for Monte Carlo inference, such an interface can be very useful to improve 
performance as it allows incremental refinement of background knowledge, e.g., 
graph procedures. Initial experiments with Dijkstra's algorithm for finding the ex- 
planation probability are very promising. 

ProbLog is closely related to some alternative formalisms such as PHA and 



ICL (Poole 1993b Poole 2000), pD (Fuhr 20001 and PRISM (Sato and Kameya 



2001 ) as their semantics are all based on Sato's distribution semantics even though 
there exist also some subtle differences. However, ProbLog is - to the best of the 
authors' knowledge - the first implementation that tightly integrates Sato's origi- 
nal distribution semantics (ISato 19951) in a state-of-the-art Prolog system without 
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making additional restrictions (such as the exclusive explanation assumption made 
in PHA and PRISM). As ProbLog, both PRISM and the ICL implementation 
AILog2 use a two-step approach to inference, where proofs are collected in the 
first phase, and probabilities are calculated once all proofs are known. AILog2 is 
a meta-interpreter implemented in SWI-Prolog for didactical purposes, where the 
disjoint-sum-problem is tackled using a symbohc disjoining technique (Poole 2000). 
PRISM, built on top of B-Prolog, requires programs to be written such that alterna- 
tive explanations for queries are mutually exclusive. PRISM uses a meta-interpreter 
to collect proofs in a hierarchical datastructure called explanation graph. As proofs 
are mutually exclusive, the explanation graph directly mirrors the sum-of-products 
structure of probability calculation (Sato and Kameya 2001). ProbLog is the first 
probabilistic logic programming system using BDDs as a basic datastructure for 
probability calculation, a principle that receives increased interest in the proba- 



bilistic logic learning community, cf. for instance (Riguzzi 2007 Ishihata et al. 



2008[ ). 

Furthermore, as compared to SLPs (Muggleton 1995), CLP {BJ\f) (Santos Costa 



et al. 20031, and BLPs (Kersting and De Raedt 2008), ProbLog is a much simpler 



and in a sense more primitive probabilistic programming language. Therefore, the 
relationship between probabilistic logic programming and ProbLog is, in a sense, 
analogous to that between logic programming and Prolog. From this perspective, it 
is our hope and goal to further develop ProbLog so that it can be used as a general 
purpose programming language with an efficient implementation for use in statisti- 



cal relational learning ( Getoor and Taskar 2007 ) and probabilistic programming ( De 



Raedt et al. 2008 ) . One important use of such a probabilistic programming language 



is as a target language in which other formalisms can be efficiently compiled. For 
instance, it has already been shown that CP-logic (Vennekens et al. 2004), a recent 
elegant probabilistic knowledge representation language based on a probabilistic 
extension of clausal logic, can be compiled into ProbLog ( Riguzzi 2007 ) and it is 
well-known that SLPs (Muggleton 1995) can be compiled into Sato's PRISM, which 
is closely related to ProbLog. Further evidence is provided in { De Raedt et al. 2008 ). 
Another, related use of ProbLog is as a vehicle for developing learning and min- 



ing algorithms and tools (Kimmig et al. 2007 De Raedt et al. 2008 


Gutmann 


et al. 2008 


Kimmig and De Raedt 2009 


De Raedt et al. 2009 


I. In the context 


of probabilistic representations (Getoor and Taskar 2007 De Raedt et al. 2008 


1, 



one typically distinguishes two types of learning: parameter estimation and struc- 
ture learning. In parameter estimation in the context of ProbLog and PRISM, one 
starts from a set of queries and the logical part of the program and the problem 
is to find good estimates of the parameter values, that is, the probabilities of the 
probabilistic facts in the program. (Gutmann et al. 2008) introduces a gradient 
descent approach to parameter learning for ProbLog that extends the BDD-based 
methods discussed here. In structure learning, one also starts from queries but has 
to find the logical part of the program as well. Structure learning is therefore closely 
related to inductive logic programming. The limiting factor in statistical relational 
learning and probabilistic logic learning is often the efficiency of inference, as learn- 
ing requires repeated computation of the probabilities of many queries. Therefore, 
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improvements on inference in probabilistic programming implementations have an 
immediate effect on learning. The above compilation approach also raises the inter- 
esting and largely open question whether not only inference problems for alternative 
formalisms can be compiled into ProbLog but whether it is also possible to compile 
learning problems for these logics into learning problems for ProbLog. 

Finally, as ProbLog, unlike PRISM and PHA, deals with the disjoint-sum-problem, 
it is interesting to study how program transformation and analysis techniques could 
be used to optimize ProbLog programs, by detecting and taking into account situ- 
ations where some conjunctions are disjoint. At the same time, we currently inves- 
tigate how tabling, one of the keys to PRISM's efficiency, can be incorporated in 
ProbLog (Mantadelis and Janssens 2009 Kimmig et al. 2009). 
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